From Tags to Topic Maps: Using Marked-up Hebrew Text to Discover Linguistic Patterns
نویسندگان
چکیده
The paper discusses a series of related techniques that prepare and transform raw linguistic data for advanced processing in order to unveil hidden grammatical patterns. It identifies XML as a suitable mark-up language to build an exploitable data bank of multi-dimensional data in the Hebrew text of the Old Testament. This concept is illustrated by tagging a transcription of Gen. 1:1-2:3 and manipulating this data bank. Transferring the data into a three-dimensional array allows advanced processing of the data in order to either confirm existing knowledge or to mine for new, yet undiscovered, linguistic features. Visualisation is discussed as a technique that enhances interaction between the human researcher and the computerised technologies supporting this process of knowledge creation. The empirical study is a small experiment that illustrates the viability and usefulness of the proposed expert devices as well as the benefits of applying information system techniques to linguistic databases.
منابع مشابه
Visualizing Mappings of Semantic and Syntactic Functions
This paper investigates the visualization of the mapping of semantic and syntactic functions that were marked up in an XML-database containing linguistic data of the Biblical Hebrew text of Genesis 1:1-2:3. It focuses on twodimensional topic maps as a graphical data-mining utility. The visual information is used to prompt the reconsideration of some existing assumptions and hypotheses about Bib...
متن کاملEntities as topic labels: Improving topic interpretability and evaluability combining Entity Linking and Labeled LDA
Hurvitz, A. (2013). Late Biblical Hebrew, Khan. Khan, G. (ed.) (2013). Encyclopedia of Hebrew Language and Linguistics, Vol. 4, Leiden, Brill, 2013. Kutscher, E. Y. (1974). The Language and Linguistic Background of the Isaiah Scroll (1QIsaa), STDJ 6. Leiden, Brill. Oosting, R., Dyk, J. and Glanz, O., Valence Patterns of Motion Verbs, Semantics, Syntax and Linguistic Variation, to be published. ...
متن کاملA review of text mining approaches and their function in discovering and extracting a topic
Background and aim: Four text mining methods are examined and focused on understanding and identifying their properties and limitations in subject discovery. Methodology: The study is an analytical review of the literature of text mining and topic modeling. Findings: LSA could be used to classify specific and unique topics in documents that address only a single topic. The other three text min...
متن کاملA Comprehensive NLP System for Modern Standard Arabic and Modern Hebrew
This paper presents a comprehensive NLP system by Melingo that has been recently developed for Arabic, based on Morfix an operational formerly developed highly successful comprehensive Hebrew NLP system. The system discussed includes modules for morphological analysis, context sensitive lemmatization, vocalization, text-to-phoneme conversion, and syntactic-analysis-based prosody (intonation) ...
متن کاملTopic Modeling and Classification of Cyberspace Papers Using Text Mining
The global cyberspace networks provide individuals with platforms to can interact, exchange ideas, share information, provide social support, conduct business, create artistic media, play games, engage in political discussions, and many more. The term cyberspace has become a conventional means to describe anything associated with the Internet and the diverse Internet culture. In fact, cyberspac...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2017